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Abstract 

The aim of this paper is to design the polynomial construction of 
a finite recognizer for hairpin completions of regular languages. This 
is achieved by considering completions as new expression operators and 
by applying derivation techniques to the associated extended expressions 
called hairpin expressions. More precisely, we extend partial derivation of 
regular expressions to two-sided partial derivation of hairpin expressions 
and we show how to deduce a recognizer for a hairpin expression from its 
two-sided derived term automaton, providing an alternative proof of the 
fact that hairpin completions of regular languages are linear context-free. 

1 Introduction 

The aim of this paper is to design the polynomial construction of a finite recog- 
nizer for hairpin completions of regular languages. Given an integer A: > and 
an involution H over an alphabet T, the hairpin fc-completion of two languages 
Li and L2 over T is the language Hfe(Li,L2) = {Q!/37H(;3)H(a) | a,^,7 e T* A 
(a/37H(/3) e Li V /37H(/3)H(a) e L2) A |/3| = k} (see Figure^. Hairpin comple- 
tion has been deeply studied [lia[i[ini[IIl[Il[ni[Il[Ii[Il[ini[2ni[2Il[21[23]. 
The hairpin completion of formal languages has been introduced in [5] by reason 
of its application to biochemistry. It aroused numerous studies that investigate 
theoretical and algorithmic properties of hairpin completions or related opera- 
tions (see for example [Ml [181 HI]). One of the most recent result concerns the 
problem of deciding regularity of hairpin completions of regular languages; it can 
be found in [11] as well as a complete bibliography about hairpin completion. 
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Figure 1: The Hairpin Completion. 



Hairpin completions of regular languages are proved to be linear context-free 
from [9]. An alternative proof is presented in this paper, with a somehow more 
constructive approach, since it provides a recognizer for the hairpin completion. 
This is achieved by considering completions as new expression operators and by 
applying derivation techniques to the associated extended expressions, that we 
call hairpin expressions. Notice that a similar derivation-based approach has 
been used to study approximate regular expressions [8J, through the definition 
of new distance operators. 

Two-sided derivation is shown to be particularly suitable for the study of 
hairpin expressions. More precisely, we extend partial derivation of regular 
expressions [T] to two-sided partial derivation of regular expressions first and 
then of hairpin expressions. We prove that the set of two-sided derived terms 
of a hairpin expression E over an alphabet F is finite. Hence the two-sided 
derived term automaton A is a finite one. Furthermore the automaton A is over 
the alphabet (F U {e})^ and, as we prove it, the language over F of such an 
automaton is linear context-free and not necessarily regular. Finally we show 
that the language of the hairpin expression E and the language over F of the 
automaton A are equal. 

This paper is an extended version of the conference paper [2| . It is organized 
as follows. Next section gathers useful definitions and properties concerning au- 
tomata and regular expressions. The notion of two-sided residual of a language 
is introduced in Section [3J as well as the related notion of F-couple automaton. 
In Sectional hairpin completions of regular languages and their two-sided resid- 
uals are investigated. The two-sided partial derivation of hairpin expressions 
is considered in Section [5l leading to the construction of a finite recognizer. A 
specific case is examined in Section [B] 

2 Preliminaries 

An alphabet is a finite set of distinct symbols. Given an alphabet E, we denote by 
E* the set of all the words over E. The empty word is denoted by e. A language 
over E is a subset of E*. The three operations U, • and * are defined for any 
two languages Li and L2 over E by: LiU L2 ~ {w G E* | w £ Li V w E L2}, 
Li-L2 = {wiW2 G E* I e Li A W2 e L2}, LI = {e} U {wi • • • Wfc G E* | Vj G 
{1, . . . , fc}, G Li}. The family of regular languages over E is the smallest 
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family T closed under the three operations U, • and * satisfying % £ !F and 
Va £ S, {a} G !F. Regular languages can be represented by regular expressions. 
A regular expression over E is inductively defined by: E = a, E — e, E — 9^ 
E = F + G, E = F ■ G, E — F* , where a is any symbol in E and F and 
G are any two regular expressions over E. The width of E is the number of 
occurrences of symbols in i?, and its star number the number of occurrences of 
the operator *. The language denoted by E is the language L{E) inductively 
defined by: L{A) = {a}, L(e) = {e}, i(0) = 0, L(F + G) = L{F) U L(G), 
L{F -G) ^ L{F) ■ L{G), L{F*) = {L{F))*, where a is any symbol in E and 
F and G are any two regular expressions over E. The language denoted by a 
regular expression is regular. 

Let w be a word in E* and i be a language. The left residual (resp. right 
residual) of L w.r.t. w is the language w~^{L) = {w' S E* | ww' S L} (resp. 
{L)w~^ = {w' e E* I w'w e L}). It has been shown that the set of the left 
residuals (resp. right residuals) of a language is a finite set if and only if the 
language is regular. 

An automaton (or a NFA) over an alphabet E is a 5-tuple A — (E, Q, /, F, 5) 
where E is an alphabet, Q a finite set of states, I C Q the set of initial states, 
F G Q the set of final states and S the transition function from Q x E to 2*3. 
The domain of the function 6 can be extended to 2*3 x E* as follows: for any 
word u; in E*, for any symbol a in E, for any set of states P C Q, for any state 
p & Q, S{P,e) = P, S{p,aw) = S{S{p,a),w) and S{P,w) = [Jp^pS{p,w). 

The language recognized by the automaton A is the set L{A) = {w G E* 
6(^1, w) n F ^ 0}. Given a state q in Q, the right language of q is the set 
L (q) = {w eT,* \ 6{q, w)nF ^ 0}. It can be shown that (1) L{A) = U^e/ L (i), 
(2) t(q) = {£ I g e P} U {{Jae^^peSi,,a)M ' ^W) and (3) a-\L{q)) = 

UpG5(«,a) ^(P)- 

Kleene Theorem [T5] asserts that a language is regular if and only if there 
exists an NFA that recognizes it. As a consequence, for any language L, there 
exists a regular expression E such that L(E) — L ii and only if there exists an 
NFA A such that L{A) — L. Conversion methods from an NFA to a regular 
expression and vice versa have been deeply studied. In this paper, we focus on 
the notion of partial derivative defined by Antimirov [ij^. 

Given a regular expression E over an alphabet E and a word w in E*, the left 
partial derivative of E w.r.t. w is the set ^ {E) of regular expressions satisfying: 
[jE'e^iE)L{E')^w-\L{E)). 

This set is inductively computed as follows: for any two regular expressions 
F and G, for any word w in E* and for any two distinct symbols a and b in E, 
£(a) = M, £(6) = £(£) = £(0) = 0, 
£(F + G) = £(F) U £(G), £(F*) = £(F) . P*, 

^(F n = ] f(^)-^u|:(^) ifeei(F), 

g^y-r-^) <^ ffi^py^ otherwise. 



^Partial derivation is investigated in the more general framework of weighted expressions 

HZ]. 
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where for any set £ of regular expressions, for any word w in S*, for any 
regular expression F, ^{£) = Usef ^"^"^ £ ■ F = Usef {-^ ' ^'^^ 

expression appearing in a left partial derivative is called a left derived term. 
Similarly, the right partial derivative of a regular expression E over an alphabet 
E w.r.t. a word w in E* is the set [E)^ inductively defined as follows for any 
two regular expressions F and G, for any word w in E* and for any two distinct 
symbols a and & in E, 

[F + G)l = {F)f U (G)f , {F*)f = F* ■ {F)-§-, 

(F nJL = [ ifeeL(G), 
^ '9- \F-{G)-f otherwise, 

where for any set £ of regular expressions, for any word w in E*, for any 
regular expression F, {£)-§- = Usefl-^)^ ^-^^ F ■ £ = Ueefl-^ ' -^1- ^"^Y 
expression appearing in a right partial derivative is called a right derived term. 
We denote by (resp. Ve) the set of left (resp. right) derived terms of 
the expression E. From the set of left derived terms of a regular expression E 
of width n, Antimirov defined in yy the derived term automaton A of E and 
showed that A is a fc-state NFA that recognizes L{E), with k < n + I. 

A language over an alphabet T is said to be linear context-free if it can be 
generated by a linear grammar, that is a grammar equipped with productions 
in one of the following forms: 

1. yl ^ xBy, where A and B are any two non-terminal symbols, and x and 
y are any two symbols in F U {e} such that (.t, y) ^ (e, e), 

2. — > e, where A is any non-terminal symbol. 

Notice that the family of regular languages is strictly included into the family 
of linear context-free languages. In the following, we will consider combinations 
of left and right partial derivatives in order to deal with non-regular languages. 



3 Two-sided Residuals of a Language and Couple 
NFA 

In this section, we extend residuals to two-sided residuals. This operation is the 
composition of left and right residuals, but it is more powerful than classical 
residuals since it allows to compute a finite subset of the set of residuals even 
for non-regular languages, which leads to the construction of a derivative-based 
finite recognizer. 

Definition 1. Let L he a language over an alphabet F and let u and v be 
two words in F*. The two-sided residual of L w.r.t. {u,v) is the language 
{u,v)-'^(L) = {weT* \ uwv G L}. 
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As above-mentioned, the two-sided residual operation is the composition of 
the two operations of left and right residuals. 

Lemma 1. Let L he a language over an alphabet T and u and v be two words 
in T*. Then: {u,v)-^{L) = (^-^(L))^-^ = u-^{{L)v-^). 

Proof. Let w be a word in T* . 

w e (u"i(L))w"i ^ wv e u~^{L) <^ uwv e L <^ (u,w)"i(L) 

<^ uwv e L <^ uw e iL)v^^ <^ w e u^^{{L)v^^). 

□ 

Corollary 1. Let L be a language over an alphabet T and u and v be two words 
in r*. Then: e £ {u,v)^^{L) uv E L. 

It is a folk knowledge that NFAs are related to left residual computation 
according to the following assertion (A): in an NFA (E, Q, /, i^, (5), a word 
aw belongs to L{q) with q d Q \i and only if w belongs to a^^{L{q)) = 
V}q'£S(q a) Siucc & two-sidcd residual w.r.t. a couple {x,y) of symbols in 

an alphabet T is by definition the combination of a left residual w.r.t. x and of a 
right residual w.r.t. y, the assertion (A) can be extended to two-sided residuals 
by introducing couple NFAs equipped with transitions labelled by couples of 
symbols in V. The notion of right language of a state is extended to the one 
of F-right language as follows: if a given word w in F* belongs to the F-right 
language of a state q' and if there exists a transition from a state q to q' labelled 
by a couple (x, y), then the word xwy belongs to the F-right language of q. 

More precisely, given an alphabet F, we set Y^y — {{x^y) | a;, j/ G F U {e} A 
{x,y) ^ (e,e)}. We consider the mapping Im from (Sr)* to F* inductively 
defined for any word w in (Sr)* and for any symbol [x, y) G Er by: Im(£) = e 
and Im((x,2/) ■ w) — x ■ Im(w) • y. Notice that this mapping was introduced by 
Sempere [24| in order to compute the language denoted by a linear expression. 
Linear expressions denote linear context-free languages, and are equivalent to 
the regular-like expressions of Brzozowski [3]. 

Definition 2. Let A = {T,,Q,I,F,5) be an NFA. The NFA A is a couple NFA 
if there exists an alphabet F such that E C Sr- In this case, A is called a F- 
couple NFA. The F-language of a T -couple NFA A is the subset Lr{A) of T* 
defined by: Lr{A) ^ {Im(w) | w G L{A)}. 

The definition of right languages and their classical properties extend to 
couple NFAs as follows. Let A = {Y.,Q,I,F,d) be a F-couple NFA and q be 
a state in Q. The T -right language of q is the subset Lr{q) of F* defined by: 
Lriq) = {lm{w) \ w G i (<?)}. 

Lemma 2. Let A ~ {Y,,Q,I,F,d) be a T-couple NFA and q be a state in Q. 
Then: Lr(^) = Ue/ ^r«. 

Proof. Trivially deduced from Definition [2l from definition of F-right languages 
and from the fact that L{A) ~ IJie/ ^ (*)• ^ 
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Lemma 3. Let A = (Ti,Q,I,F,S) be a T-couple NFA and q be a state in Q. 
Then: tr(g) = {e \ q e F} U {J(.,y)e^,,'e5i,.i.,y)){^} ' '^Aq') ■ {y}- 
Proof. Trivially deduced from Definition [51 from definition of F-right languages 
and from the fact that L (q) = {e \ q G F} U Uaes q'e5{q a){^} ' ^ i^')- '-' 

Corollary 2. Let A — {Yi,Q, I, F,d) be a T-couple NFA, {x,y) be a couple in 
Er and q be a state in Q. Then: {x,y)~'^{tr{q)) = [jq'eS{q,ix,v)) ^r(g')- 

The following example illustrates the fact that there exist non-regular lan- 
guages that can be recognized by couple NFAs. 

Example 1. Let T — {a,b} and A be the automaton of the Figure[^ The 
T-language of A is Lt{A) = {a"fe" | n £ N}. 

(a, 6) 

Figure 2: The Couple Automaton A. 

As a consequence there exist non-regular languages that are recognized by 
a couple NFA. In fact, the family of languages recognized by couple NFAs is 
exactly the family of linear context-free languages. 

Proposition 1. The T-language recognized by a T-couple NFA is linear context- 
free. 

Proof. Let A = (S, Q, /, F, 5). Let us define the grammar G = (X, V, P, S) by: 

• X = T, the set of terminal symbols, 

• V = {Aq \ q E Q} L) {S} , the set of non-terminal symbols, 

• P ^ {S Ag \ q e 1} U {Ay ~> e \ q £ F} U {Ag ^ aAg,p I q' e 
6{q, (a, /?))}, the set of productions, 

• S, the axiom. 

1. Let w be word in F*. Let us first show that w belongs to the language 
generated by the grammar Gg = (X, V, P, Aq) if and only if it is in Lr{q), 
by recurrence over the length of w. 

(a) Let us suppose that w = e. By construction of Gg, Aq ^ e if and 
only if g e i^, i.e. e £ Lr{q). 

(b) Let us suppose that w — aw' fi with (a,/3) ^ (£,£)• By definition 
of L{Gg), w G L{Gg) if there exists a symbol Agi in V such that 
Ag — > aAgi P and w' £ L{Gq'). By recurrence hypothesis, it holds 



6 



that w' £ L[Gqi) <^ w' E 'ir{q')- Since by construction Aq — >■ 
aAqij3 ^ q' £ S{q, (ck,/3)) and since according to Lenima[3l Lr{q) = 
{e \ q E F}UU(,^)6Sa' ei5(q,(i:,t;)){-^} ' ^tW) ' {j/}, it holds that 
w e L{Gq) E Lr{q)- 

2. Since L{G) = [jq\s^A, L{Gq), it holds from (1) that L{G) = U,e/ ~ir{q), 
that equals according to Lemma [5] to L{A). 

Finally, since the F-language of A is generated by a linear grammar, it is 
linear context free. □ 

Proposition 2. The language generated by a linear grammar is recognized by 
a couple NFA. 

Proof. Let G = {X, V, P, S) be a linear grammar. Let us define the automaton 
A=(I],Q,/,F,<5) by: 

• S = Ex, 

• I^{S}, 

• F = {BeV\{B^e)eP}, 

• B' e 5{B, {x,y)) ^ {B ^ xBy) G P. 

For any symbol B in V , let us set Gb ~ {X, V, P, B). Let w be a word in X* . 
Let us show by recurrence over the length of w that w e L{Gb) <^ w E L x{B). 

1. Let w — e. Then e G L{Gb) if and only if {Gb — J> e) G P. By construction, 
it is equivalent to B E F and to £ G L x{B). 

2. Let us suppose that w is different from e. Then by recurrence hypothesis 
and according to Lemma [3] 

w G L{Gb) 

^ 3{x, y) G Sx, w' E X\B' eV \ w ^ xw'y A {B ^ xB'y) eP Aw' E L{Gb') 
^ 3{x,y) E ^x,w' eX*,B' eV \ w = xw'y A B' E 5{B,{x,y)) Aw' E ~L x{B') 
-i^ w E Lx{B) 

Finally, since L{G) = L{Gs) = 'is{B), it holds from Lemma[5]that L{G) = 
L{A). □ 

Theorem 1. A language is linear context-free if and only if it is recognized by 
a couple NFA. 

Proof. Directly from Proposition [1] and from Proposition [21 □ 
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We present here two algorithms in order to solve the membership problenjf] 
via a couple NFA. The Algorithm [2] checks whether the word w 6 F* is recog- 
nized by the F-couple NFA A. It returns TRUE if there exists an initial state 
such that its F-right language contains w. The Algorithm [1] checks whether the 
word w G F* is in the F-right language of the state q. 



Algorithm 1 IsInRightLanguage(v4,i(;,(7) 

Require: A = (S, Q, I, F, 6) a F-couple NFA, w a word in F*, g a state in Q 
Ensure: Returns {w G Lr{q)) 

1: ii w — e then 

2: P ^ (qeF) 

3: else 

4: P ^ FALSE 

5: for all {q, {a, (5), q') £ 5 \ w = aw' (i do 
6: P ^ P W IsInRightLanguage(A, w' , q') 
7: end for 
8: end if 
9: return P 



Algorithm 2 MembershipTest(^,w) 

Require: A = (E, Q, I, F, 6) a F-couple NFA, w a word in F* 
Ensure: Returns {w £ Lr{A)) 

1: FALSE 

2: for all i 6 J do 

3: i? ^ i? V IsInRightLanguage(A, w, i) 
4: end for 
5: return R 



Proposition 3. Let A = (E, Q, /, F, 5) he a T-couple NFA, q be a state in Q 
and w be a word in F* . The two following propositions are satisfied: 

1. Algorithm\3i IsInRightLanguage(A, w, q) returns (w G LY{q)), 

2. Algorithm\^ M ember ship Test ( A, w) returns (w G Lr{A)). 

Proof. Let w be a word in F*. 

1. Let us show by recurrence over the length of w that the algorithm Isln- 
RightLanguage(A, w, q) returns {w G Lr{q)). 

If w = £, F = TRUE ^ q e F ^ e E ~Lr{q). 

Let us suppose now that \w\ > 1. Then P = V(g,(a,^),g')6<5|«'=««''/? 
RightLanguage(A, w', q'). If there is no transition (q, (a, g') G 5, 

^ Given a language L and a word w, does w belong to L? 
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then trivially w ^ Lr{q)- For any {q, {a, /S), q') G S, let us notice that 
(a,/3) G Sr- As a consequence, the length of any word w' satisfying 
w = aw'P A {q,{a,f3),q') G 5 is strictly smaller than Let w' be a 
word satisfying w — aw' f5 A {q, {a,f3),q') G S. According to recurrence 
hypothesis, IsInRightLanguage(^, w', q') returns {w' G Lr{q')). Hence 
P = V(g,(a,/3),g')e51«.=a«,'/3 K ^ Lr{q')). Finally, according to Lemma[3 
(w G tr{q)). 

2. Since R ~ Vie/ IsInRightLanguage(A, w, i), it holds as a direct conse- 
quence that R = Vie/C""^ ^ ^r(*))- Hence, according to Lemma [U it 
holds that R = {w E ir(^))- 

□ 

The following sections are devoted to hairpin completions and their two- 
sided residuals. It turns out that hairpin completions are linear context-free 
languages. Hence, we show how to compute a couple NFA that recognizes a 
given hairpin completion. 

4 Hairpin Completion of a Language and its Resid- 
uals 

Let r be an alphabet. An involution f over F is a mapping from F to F satisfying 
for any symbol a in F, f(f(a)) — a. An anti-morphism /i over F* is a mapping 
from F* to F* satisfying for any two words u and v in F* //(u • v) — • n{u). 
Any mapping g from F to F can be extended as an anti-morphism over F* as 
follows: Va G F, Vw G F*, g(e) = e, g(a ■ w) = g{w) ■ g(a). 

Definition 3. Let F be an alphabet and H be an anti-morphism over F*. Let Li 
and L2 be two languages overT. Let k > be an integer. The (i?, fc) -completion 
of Li and L2 is the language Hfc(ii,L2) defined by: 

Hfc(Li,L2) 

{a/37H(^)H(a) | a, /3, 7 G F* A ia(3jR{/3) G ii V /37H(/3)H(a) G L^) A |/?| = k}. 

The (H, fc)-completion operator can be defined as the union of two unary 
operators tlk and H^. 

Definition 4. Let F be an alphabet and H be an anti-morphism over F* . Let 
L be a language over F. Let k > be an integer. The right (resp. left) {H, k)- 
completion of L is the language Hfc(i) (resp. tik{L)) defined by: 

}tk{L) ^ {Q/37H(^)H(a) | a, ^, 7 G F* A a^7H(/3) G L A = k}, 
kk{L) = {a/37H(/3)H(a) | a,/3,7 G F* A /37H(/3)H(a) G i A = k}. 

Lemma 4. Let F be an alphabet and H be an anti-morphism over F*. Let Li 
and L2 be two languages over F. Let k > be an integer. Then: 
Hfe(ii,L2) =Ht(ii)ui^(L2). 
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Proof. Let w be a word in T*. 

w = a;37H(/3)H(a) 
w e Hfc(Li,L2) ^ <( A(a/?7H(/3) e Li V /37H(/3)H(a) G L2) 
A|;3| = A: 

= a/37H(/3)H(a) A a;37H(/3) e Li A = fc) 
V(w = a;37H(/3)H(a) A /37H(/3)H(a) G ^2 A = fc) 

ui e Ht(ii) V w e fill(L2) ^ ui e Ht(ii) u th{L2). 

□ 

When H is an involution over F, the (H, fc)-coinpletion of Li and L2 is called 
a hairpin completion [9J . Even in the case where H is not an involution, we will 
say that languages such as Hfc(L), flfe(i) or Hfc(L,i') are hairpin completed 
languages and we will speak of hairpin completions. We first establish formu- 
lae in this general setting in order to compute the two-sided residuals of the 
completed language of an arbitrary language. The following operator is useful. 

Definition 5. Let T be an alphabet and H be an anti-morphism over T* . Let L 
he a language over an alphabet T. Let k > be an integer. The language H'j.(L) 
is defined by: U'^iL) = {/37H(/3) £ L | /3,7 £ T* A = fc}. 

We split the computation of two-sided residuals of a completed language 
w.r.t. {x,y) couples: the first case is when both x and y are symbols. 

Lemma 5. Let T be an alphabet and H be an anti-morphism over T* . Let L be 
a language over an alphabet T. Let k > be an integer. Let L' be a language in 
, Hfc(i), H'j,(L)}. Let w a word in T* . Then: 

w e L' ^ \w\>k A 3a e r, 3w' e T*, u; = au;'H(a). 

Proof. Trivially deduced from Definition |4] and Definition [S] □ 

Corollary 3. Let T be an alphabet and H be an anti-morphism over T* . Let L 
be a language over an alphabet T. Let k > be an integer. Let L' be a language 
in {th{L),El{L),ll',{L)}. Then: V - U.erW ' {{xMx))-\L')) ■ {H(x)}. 

Proposition 4. Let T be an alphabet and H be an anti-morphism over T* . Let 
L be a language over T. Let {x,y) a couple of symbols m F x F. Let k > be 
an integer. Then: 

«/y7^H(x), 

{x,y)-\llk{L)) = { Rk{x'HL))U{x,y)-\L) */y = H(x) A fc 1, 

hI{x-^{L)) Ull'^_^{{x,yy^{L)) otherwise, 

^fy^i^{x), 

{x,yr\^k{L)) ^ { th{{L)y-^)U{x,yrHL) ify^il{x) A fc = 1, 

th{{L)y-^) Ull'k_i{{x,y)~\L)) otherwise, 
-f> if y^ nix), 

{x,y)-HKiL)) ^ { K-Ai^^yrHL)) «/2/ = H(x) a fc>l, 
{x,y) ^{L) otherwise. 
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Proof. Let w be a word in T* . According to Lemma [51 any word u in (L) U 
iik{x~^{L)) U HJ^.(L) can be split up into avb with b = H(a). As a conse- 
quence, whenever y ^ H(x), it holds that {x,y)^^(ilk{L)) — {x,y)~^{tlk{L)) = 
{x,y)^^{ll'f.{L)) — 0. Let us suppose now that y — H(x). 

(I) Let us define the languages Li and L2 by: 

ii = (.T,y)-i(Ht(L)), 
' ulix-^{L))DU',_,iix,y)-^{L)) if fc > 1, 
Hfc(a;-i(L)) U {x,yy^{L) otherwise. 
Then: 

ui G Li <;=5> xwy G Hfc(L) 

(xwy = xal3^}i{l3)R{a)y Ay ^ H(a;) A xa(3^R{l3) £ L A |^| = fc) 
V(a;wy = a;^7H(/3)?/ Ay = H(x) A X;07H(/3)y G L A |^| = fc - 1) 
{w = a^7H(/3)H(a) A y = H(x) A a/37H(^) e ^-^(L) A = fc) 
V(u; = /37H(/3) A y = H(x) A ^7H(/3) e (a;, y)"^-^) A |/3| = fc - 1) 
(it; = a/37H(^)H(Q!) A y H(a;) A w £ }tk{x-^{L))) 
^ V(u) = /37H(/3) A y = H(x) A w G H;^_i((x, yy^{L)) A fc ^ 1) 
V(u) = 7 Ay = H(a;) Aw & {x,y)-^{L) Ak = 1) 

(II) Let us set: 

Li = (x,y)-i(il^(i)), 
' %,{x-\L))yj\l'^_^{{x,yr\L)) iffc>l, 
ll^(a;"i(L)) U (x,y)"i(i) otherwise. 
Then ' 
lii e Li <^ xwy G HA;(i) 

{xwy = xa/37H(/3)H(a)y A y = H(a;) A /37H(/3)H(a)y G i A |^| = fc) 
V(a::wy = x/37H(/3)y A y = H(a;) A a::^7H(/3)y G L A |^| = fc - 1) 
{w = a/37H(/3)H(a) A y = H(a;) A /37H(^)H(a) G (i)y-^ A |/3| = fc) 
V(u; = /37H(/3) A y = H(.t) A G (x, y)-^^) A |/3| = fc - 1) 

{w = a/37H(/3)H(a) A y = H(x) A ix; G fc((L)y-i)) 
O <( V(w = /37H(/3) A y = H(x) A w G H;._i((x, y)-i(L)) A fc ^ 1) 

V(w = 7 Ay = H(x) a w G (x,y)"i(i) A fc = 1) 
<^4> w G ^2- 



i2 = 



(III) Let us set: 



Then: 



ii = (x,y)-i(H',(L)), 
i2=H',_i((x,y)"i(i)), 
L3 = (x,y)-i(L). 
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w e Li xwy G H^(i) 

xwy = xf3j'5i{j3)y 
Ay = R{x) 
Axl3jll{P)y e L 
A\P\ =k^l 
w = l3"fR{f3) 
Ay = H(a;) 

A/37H(/3) e {x,y)-'iL) 
A|/?| 

{w = /37H(;3) Ay = H(a;) A w G U'^_j^iix,y)~^{L)) Ak > 1) 
V(w e A A: = 1) 

{w e L2 Ak> 1) 
V{w e L3 A fc = 1) 

□ 



4^ 



The problem of two-sided residuals of an hairpin completion w.r.t. couples 
{x, y) with either x or y equal to e is that they add one catenation that has to 
be memorized. It can be checked that this may lead to infinite sets of two-sided 
residuals. 

Proposition 5. Let T be an alphabet and H be an anti-morphism over T* . Let 
L he a language over an alphabet T. Let k > be an integer. Let L' be a 
language in {fffc(L), Hfc(L), HJ.(_L)}. Let x be a symbol in T. Then: 
(x,£)-i(L') = (x,H(x))-i(L')-{H(x)}, 
(£,x)-i(L') = U6r|H(.)=.W ■ 

Proof. Directly deduced from Lemma [1] and from Corollary [3l □ 

Let L be a language over an alphabet F. The set TZl of two-sided residuals 
of L is defined by: TZl = Ufc>i "^l^ where 



{{x,y)-^L)]{x,y)e^r} if fc = 1, 

{{x,y)-\L') I {x,y) £ A L' G 7^^"l} otherwise. 
From now on we focus on hairpin completion of regular languages. Let us 
recall that such a completion is not necessarily regular [5]. 

Lemma 6. The family of regular languages is not closed under hairpin comple- 
tion. 

Proof. Let F = {a,b,c}, fc > be a fixed integer and H be the anti-morphism 
over F* defined by H(a) = a, H(5) c and H(c) = b. Let L' = Ht(L(a*&'=c'=)). 
Let us first show that L' — {a^b'^c'^a" | n > 0}. Let w be a word in F*. 
weL' <^w^ a/37H(/3)H(a) A a/37H(/3) e L{a*h^c^) A |/?| = fc 
a/37H(/3)H(a) Aa£ L{a*) A H(/?) = c*^ A /? = 6'= 
w — a^b^c^a^ with n > 0. 
For any integer j > 0, let us define the language by: 

L' if J = 0, 



a ^{L',i) otherwise. 
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Consequently, it holds Lj = {a" ^h^c^a^ \ n > j}. Finally, since for any 
two distinct integers j and j', the word h^c^a^ belongs to L'j \ L'^,, it holds that 
for any two distinct integers j and j', L'j ^ L'j, and (a^)^^(i') ^ {a^ )~^{L'). 
As a consequence, the set of left residuals of L' is infinite. □ 

The set of two-sided residuals of a hairpin completion of a regular language 
may be infinite, but the restriction to residuals w.r.t. couples (a;, y) of symbols 
is sufficient to obtain a finite set of two-sided residuals and a finite recognizer. 



5 The Two- Sided Derived Term Automaton 

The computation of residuals is intractable when it is defined over languages. 
However, derived terms of regular expressions denote residuals of regular lan- 
guages. We then extend the partial derivation of regular expressions [T] to the 
partial derivation of hairpin expressions. 

A hairpin expression E over an alphabet F is a regular expression over F or 
is inductively defined by: E ^ iik{F), E = utiF), E = H'j^(F), E = Gi + G2, 
where H is any anti-morphism over F*, A: > is any integer, F is any regular 
expression over F, and Gi and G2 are any two hairpin expressions over E. 
If the only operators appearing in E are regular operators (-1-, • or *), the 
expression E is said to be a simple hairpin expression. The language denoted 
by a hairpin expression E over an alphabet F is the regular language L(E) if 
i5 is a regular expression or is inductively defined by: L(flfc(F)) ~ ff^(_L(F)), 
L(Ht(F)) = Ht(L(F)), LiR'.iF)) = H;(L(F)),^ L{G^ + G2) = L{G,) U L(G2), 
where H is any anti-morphism over F*, > is any integer, F is any regular 
expression over F, and Gi and G2 are any two hairpin expressions over F. 

Definition 6. Let E be a hairpin expression over an alphabet F. Let {x,y) 
be a couple of symbols in Sp . Let k > be an integer. The two-sided partial 
derivative of E w.r.t. (x,y) is the set -^-^ — (E) of hairpin expressions defined 
by: 

( {F)-§- ifx^e, 

[ U_F'G^(F)(-^')^ otherwise, 
5 " «/2/^H(x), 

{KiF)) = { Ht(£(F)) U ify^ H(x) A fc = 1 

Ht(|:(F))UH',_i(g^(F)) otherwise, 

!^t,{F)) = \ U ^fy^ H(x) A fc = 1 

%,{{F)-l-)yj\l'^_,{^{F)) otherwise. 
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(F) otherwise, 

-Q — "v^i 1 ^-z) — -Q, — -(Gi) U g-^(G'2), 
where H is any anti-morphism over T* , k > is any integer, F is any 
regular expression over T, Gi and G2 are any two hairpin expressions over 
r, and for any set % of hairpin expressions: Hfc(H) = {Hfc(i/) | H G T-L] , 

= {%.{H) \Hen}, H',(H) = {U',{H) I H e n}. 

Let E he a hairpin expression over an alphabet T. The set of two-sided 
derived terms of the expression E is defined by: 

= Ufe>i 2^1: where: 
y^i..y)e^rah-^F) iffc = l, 

""I U(.„),,„,,,£|^a(li;)(^') otherwise. 
Derived terms of regular expressions are related to left residuals. Let us show 
that derived terms of hairpin expressions are related to two-sided residuals. 

Proposition 6. Let E be a hairpin expression over an alphabet F. Let {x,y) 
be a couple of symbols in F^. Then: \_\p^ a L{F) — {x,y)^^{L{E)). 

Furthermore, if E is a regular expression, the proposition still holds whenever 
{x,y) is a couple of symbols in Er- 

Proof. Trivially proved by induction over the structure of E, according to Propo- 
sition [H □ 

Determining whether the empty word belongs to the language denoted by a 
regular expression E can be performed syntactically and inductively as follows: 

e ^ L{a), e i L(0), e G i(e), 
e G L(Gi ■G2)^e(^ L{Gi) A e G ^(Gs), 
e G L(Gi + G2)^ee L{Gi) Wee L{G2), e G L{Gl). 
This syntactical test is needed to compute the derived term automaton since 
it defines the finality of the states. We now show how to extend this computation 
to hairpin expressions. 

Lemma 7. Let F be a regular expression and Gi and G2 be two hairpin ex- 
pressions. Then: 

E i L(Ht(F)), e i nkk{F)), e i L(H',(F)), 
e G L(Gi + G2) <^ e G X(Gi) V e G ^(Ga). 

Proof. Trivially proved according to Definition [H Definition O and definition of 
languages denoted by hairpin expressions. □ 

The following example illustrates the computation of derived terms. For 
clarity, in this example, we assume that hairpin expressions are quotiented w.r.t. 
the following rules: e ■ E ^ E, ■ E ^ %. Moreover, sets of expressions are also 
quotiented w.r.t. the following rule: {0} ~ 0. 
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Example 2. Let T = {a, 6,c} and H be the anti-morphism over V* defined by 
H(a) = a, H(6) — c and H(c) ~ b. Let E = Hi(a*5c). Derived terms of E are 
computed as follows: 

whr^E) = {E], 

^^E) = {t,{c),e}, 

^(Ht(c)) = {Ht(e)}. 
Other partial derivatives are equal to 0. Furthermore, it holds that e is the 
only derived term F of E such that e belongs to L{F). 

In the following we are looking for an upper bound over the cardinality of 
the set of two-sided derived terms, thus we apply no reduction to the regular 
expressions. Notice that this cardinality decreases whenever any reduction is 
applied. 

Lemma 8. Let E and F be two regular expressions over an alphabet T. Then 
the three following propositions hold: 

1. hs+F C ^ U 

2. hs-F C^E - v^u^u^, 

3. chE-E*UE* ■V^U^\J{hE-E*)-'D^uhE-{E* ■ V^). 
Furthermore, ^ ~ = and ^ ~ {e} for any symbol a in T. 

Proof. Basic cases (e, and a in F) are trivially proved directly applying Defi- 
nition [5) 

By induction over the structure of the set of two-sided derived terms. Sup- 
pose that E and F are two regular expressions over an alphabet F. Let {x,y) 
be a couple of symbols in Sr. 

1. Let us first show that g(f (E + F) C ^?eU^?f- According to Definition|6l 
it holds: 

r f{E + F) ify = e, 

Oih)iE + F)={ {E + F)§- if. = s, 

[ UGe^(B+f)(G)a; otherwise. 

j-{E)^j-{F) ^ ify = e, 

{E)lyj{F)l ifx = ., 
UGe^(£)('^)|; UUGeJL(j.)(G)|^ otherwise. 

Notice that the three following conditions hold; 

£(£;)u£(F) C^U^C^U^, 

U (F)^ C Pi^ U C ^ U 
UGe^(i.)(G)|; U UL^(F)(G)^ = oikr^iE) U ^(F) C ^ U^. 

As a consequence, -ttt 
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Furthermore, since d efinition of the sets of two-sided derived terms, for 
any expression G in Ve (resp. in q(^{G) C ^ (resp. g^-y)[G) C 

^>f), the proposition is satisfied. 
Let us set £ ~ 
(a) Let us first show that 

According to Definition [SI it holds: 

[ UG6^(£; i=^)(<^)^ otherwise. 
( d_[E)-F\J-l-{F) ' ■ily = e heeL{E), 

i{E)-F iiy^e AeiL{E), 

{E)f- \jE-[F)-§- iix = e Aee L{F), 

E ■ {F)-§- " if X e Ae ^ L{F), 

UGe^(^).i=^(G)^UUGe^(F)(G)|; ifx,yer A e e HE) 
[jGe^{E}-FiG)-B; otherwise. 
Notice that the three following conditions hold: 
^{E)-FU-§^{F)C^-V^U^, 
{E)j- U S • C ^ U ^ • P^, 



dy 



Moreover, 

^-^iEYF^^hy - I Ug,^(£;) G ■ {F)j- otherwise. 

Finally, since Ugg^(£) G ■ (F) ^ (E) ■ (F) C ^ ■ and 
since lJGe^(-E)(G)ir ~ d{x y) (-^) -^^j proposition is satisfied, 
(b) Let us now show that for any expression G in £, -Q(§-y^iG) C £. 

i. if G belongs to (resp. to 1?f), by definition of the set of two- 
sided derived terms it holds g(f (G) C (resp. g(f (G) C 
^). ^ 

ii. If G belongs to ■ T>F, then G = Gi ■ G2 and from (2a) it 
holds that -Q§-y){G) C ^ • U ^ U According to 
definition of the set of two-sided derived terms, the four foUwong 
cond itions h old- 

C ^, ^ C C and ^ C 

As a consequence, the proposition is satisfied. 

Let us set £ ~ % ■ E* yj E* -vl^^E^ {Ve ■ E*) -vI^Ve ■ [E* ■ vl). 
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(a) Let us first show that g^f ^-j {E*) C £. According to Definition [HI it 
liolds: 

r ^(E*) ify = £, 

[ UGe^(B-)(G')|; otherwise, 
f iiE)-E* ^ ify = £, 

E*-{E)f- ifa; = £, 

_ UGe^(E)(G-^*)^ otherwise. 
Notice that (E) ■ E* C ■ E* and tliat i;* ■ {E)-§- C E* ■ V^. 
Moreover, 



UGe^(iJ)('^'^*)s; 
[JGe^iE)iG)i^G-{E*-{E)^) 



- [jGe4-(E)(G)K: ^ ^GeJi^iE) G ■ {E* ■ (-©)#) 



Finally, since the two following conditions hold: 
{}a^^iE){G)f^ = oh-^E)c^E 

and UGe^(B) G ■ {E* ■ {E)§-) = §-{E) ■ [E* ■ C ^ • (i?* • 

it holds that ^r^fE'*) C £. 

a{x,y) V ^ 

(b) Let us now show that for any expression G in £, -Qj^ry^iG) C £. 

i. if G belongs to , by definition of the set of two-sided derived 
terms it holds n C 

d(x,y) 

ii. if G belongs to Ve-E*, then G = Gi-E* and from (2a) it holds 
that: 

g(^(G) c £(G0 . iE*)-§- U ^(Gi) U ^(i;*). 

Moreover, since from (3a) g^f ^) (E*) C since ^(Gi) C ^'is 
and since 9(7^ (Gi) C ^?e, it holds that: 

c i?^ ■ ■ 2?^) u 2?^ u £ 

c £ 

iii. if G belongs to E* ■ Ve, then G — E* ■ Gi and from (2a) it holds 
that: 



(G) c ^E*) ■ (Gi)# U ^AE*) U jS-AG,] 



d(x,y)y ' V ' ^ ' dy d{x,y)y ' d(x,y 

Moreover, since from (3a) Qj^iE*) C £, since (Gi)^ C 
and since o^i^iGi) C ^: 
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C (Ve ■ E*) -vtLlSU^ 
C £ 

iv. If G belongs to • £;*) • V^, then G = (d • -B*) • and from 
(2a) it holds that: 
3^(G) c £(Gi • E*) ■ (G,)^ ^olh^G,- E*) U olh){G,). 

Since |-(Gi • S*) C |-(Gi) ■ £;* U it holds that: 

|:(Gi • E*) . {G,j^ c (£(Gi) -E*) . {G,)l U • E*) ■ (G^)^. 

Finally, since from (3bii) g(f^(Gi • E*) C f , it holds: 

c£ 

V. If G belongs to ■ (E* ■ V^), then G = Gi • (£;* • G2) and from 
(2a) it holds that: 
g^iG) C £(Gi) . (E* ■ G,)l U a(^(Gi) U • G2). 

Since {E* ■ G2)-§- C iE*)-f U E* ■ {G2)-§-, it holds that: 

|:(Gi) • • G2)|; C l{G^) ■ {E*)§- U £(Gi) • (i?* • (G^)^). 

Finally, since from (3biii) • G2) C f , it holds: 

C I?i •(£;*• Pi?) U £ U 
As a consequence, the proposition is satisfied. 

□ 

Proposition 7. Let E he a regular expression of width n > and of star number 
h. Let us set m — n + h. Then the three following propositions hold: 

1. Card(^) < n, 

2. GaiAiV^) < n, 

3. Card(P^) < 2m X („.+!) x(m+2) _ 3^ 

Proof. For the set of left derived terms, the proposition is proved in [T], where 
it is shown that the cardinality of the set {E' \ 3w G E+,£" e -^{E)} is less 
than n. This bound still holds for the set of right derived terms. 

Let ni (resp. 712) be the width of a regular expression F (resp. G) and hi 
(resp. ft-2) be the star number of F (resp. G). Let us set nii — ui + hi and 
m2 = 712 + /i2. For E ~ F + G and ioi E — F ■ G, we have n = ni + 712, 
h = hi + h2 and m = rrii + 7712. For E — F* , we have 7t = 711, h — hi -\- \ and 
771 = mi + 1. 

According to Lemma [U we get: 
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1. hp+G c ^u^, 

i > i !• 4—^ ^— > 

2. Vf-g cVf -VgUVfU Vg, 

3. ^ ■ F* U F* -V^uhtu (^V ■ F*) -V^U^F ■ {F* ■ vt). 
As a consequence, we get: 

1. Card(l?F+G) < Card(^) + Card(^), 

2. Card(i?F G) < CaiA(^) + GaiA{^) + nin2, 

3. Card(!?^) < Card(l5^) + 2ni(ni + 1). 

On the one hand the cardinahty of is strictly greater than the cardinahty 
ahhough F and F* have the same width ni; we therefore substitute the 
parameter mi = ni + hi to ni, so that F* is associated with mi + 1. 

On the other hand, the maximal increase of the cardinality of (w.r.t. m) 
occurs in the star case; we therefore consider the function such that: 

1. </>(0) = and 0(1) = 1, 

2. (/)(fc + 1) = (?i(fc) + 2 X fc X (fc + 1), 

and we show that i?^ < 0(m) for any regular expression E. 

According to Lemma [8] and by induction hypothesis, it holds: 

1. Card(i?F+G) < 0(mi) + (/.(ma), 

2. Card(!?^ g) < (f>{fni) + (f){m2) + Til X 712, 

3. Card(^?^) < <j){mi) + 2ni(rii + 1). 

It can be checked that: 

(t){mi) + cj){m2) < 4>{mi) + 4>{m2) + rii x n2 < 0(mi + ma). 
As a consequence, it holds: 

1. Card(^ 

f+g) (f){mi + ma), 

2. Card(^ 

Furthermore, by definition of (/> and since mi > ni, it holds: 

(j){mi) + 2ni(ni + 1) < 4>{mi) + 2(mi)(mi + 1) = (/)(mi + 1) 

and consequently Card(i>F' ) < (/"(^JT-i + 1)- 

Finally, since X]j=i iO + 1) = MMiliMHl^ holds for all integer A; > 1: 

0(fc) ^ Wl)(fc+2) _ 3^ 

□ 

Proposition 8. Lef E be a regular expression over an alphabet T, H be an 
antimorphism over F* and k > be an integer. Then: 
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1. Card(i?H'j£;)) < ^ x Ca.vA(^), 

2. Card(^jj^(^j) < Card(^) + fc x Card(^), 

3. Card(^?^j^) < Card(25^) + fc x Card(^). 

Proof. Let i? be a regular expression. 

(1) Let us set £ = {^'k.{E') \ E' A fc' < fc} U Let us show that 

^hTce) C £. 

(a) According to Definition|ni for any couple (a;, y) in Sr, g^^-^(Hj,(£')) C £. 

(b) Let us show that any derived term of an expression G in £ belongs to 

£■ ^ 

(1) if G belongs to Ve, so do its derived terms. 

(ii) if G e {il'k,{E') \ E' eV^ A k' < k}, then G = H'j^,(Gi) with GiE^ 
and from Definition [6] it holds: 

a^(G)c{H;,„(G2)|G2e^ A fc"<fc'}u^. 

By definition of Gi, C 1?b. Consequently g(f ^-j (G) C 

(c) Finally, since Card(£:) = (fc-l)xCard(^) + Card(^), the proposition 
holds. _^ ^ ^ 

(2) Let us set g = {H fc(^0 | E' e Ve}U{11',^,{E') \ E' gVe A k' < kjUVs- 
Let us show that fc^+Tlt C £. 

(a) According to Definition|6l for any couple (a;, y) in Sr, Q(x.,y) O^^iE)) C 

(b) Let us show that any derived term of an expression G in f belongs to 

(i) if G belongs to {}tk{E') \ E' e ^} then G = Ht(Gi) with Gi £ ^ 
and from Definition [S] it holds that: 
9^(G) C {Ht(G2) I G2 e ^} U {H',, (G2) I G2 e ^ A fc' < fc} U 2?G, . 

Since by definition of Gi, hcl C and i?Gi C ^?e, it holds: g^f ^) (G) C 5. 
(iiUf G belongs to {H;^.,(-B') | i;' e A fc' < fc},then G = H'^,(Gi) with 
Gi e and from Definition IH] it holds: 

^ (G) C {H^„ (G2) I G2 G ^ A fc" < fc'} U 

By definition of Gi, ^ C 1?^. Hence 9(|^(G) C £. 

(iii) if G belongs to ^E , so do its derived terms. 

(c) Finally, since Card(£:) = Card(l5^) + (fc - 1) x Card(^) + Card(^), 
the proposition holds. 

(3) The proof is similar as for case (2), with Ve playing the role of 

□ 

The index of a hairpin expression E is the integer index(£') inductively 
defined by: 

index(i^) = 0, 
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mdeji{iik{F)) = k, index(Ht(F)) = k, index(H;^(F)) = fc, 
index(Gi + G2) — max(index(Gi), index(G2)), 
where H is any anti-morphism over F*, fc > is any integer, F is any regular 
expression over F, and Gi and G2 are any two hairpin expressions over F. 

Proposition 9. Let E be a hairpin expression over an alphabet F. Then IS 
a finite set the cardinal of which is upper bounded 6y fc x ( '^"^'"+3^^"'+'^^ -3)+n, 
where k is the index of E, and m ~ n + h with n its width and h its star number. 

Proof. Directly deduced from Proposition [7] and from Proposition |8] for the non- 
sum cases. Whenever E = Gi + G2, let us set for i E {1, 2}, rij the width of Gi, 
hi its star number, fcj its index and mi = Ui + hi. Without loss of generality 
suppose that ki > k2. Let </> be the function defined by: 

if fc = 0, 



m 



2fc(fc+l)(fc+2) 



g 3 otherwise. 



It can be checked that the following proposition P holds: 

(f>iki+k2)>^iki) + ^{k2). 
By induction and from P it holds: 

Card(l?^) + Card(^) < ki X (f>{mi) + 711+ k2(t>{m2) + «2 

< /ci X {(pimi) + (/!)(to2)) + n 

< ki X (j){mi + 7712) + n 

□ 

This finite set of two-sided derived terms allows us to extend the finite de- 
rived term automaton to hairpin expressions. 

Definition 7. Let E be a hairpin expression over an alphabet F. Let A = 
{^T,Q,I,F,S) be the NFA defined by: 

• Q^{E}UV^, 

• I = {E}, 

• F = {E' eQ\eE L{E')}, 

. V(a;, y) e Sr, Vi?' £ Q, S{E' , {x, y)) = g^iE'). 

The automaton A is the two-sided derived term automaton of E. 

By construction, A is a F-couple NFA where F is the alphabet of E. 

Example 3. Let E be the hairpin expression of Example\^ The derived term 
automaton of E is the automaton presented in Figure\^ 

(a, a) 



{b,c) \^ {b,c) ^ [c.b) 



Figure 3: The Derived Term Automaton of the Expression E. 
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Proposition 10. Let E he a hairpin expression over an alphabet T and A he 
the two-sided derived term automaton of E. Then: L{E) ~ Ly{A). 

Proof. Let A = {T,,Q,I,F,6), let w be a word in F* and let E' be a state in 
Q. Let us show that the following proposition (P) is satisfied: w G Ly{E') <^ 
w S L(E'). By recurrence over the length of w. 

(I) If w = e, then: 
w G ~tr{E') 

^ E' eF (Lemma O 

<^ £ e L{E') (Construction of A) 

■^we L{E'). 

(II) Let us suppose that > 0. Then 3{x,y) G Sr, ^w' G F* such that 
w = xw'y. 

(a) If E' is a simple hairpin expression, then 
xw'y G tr{E') 

^ w' e {x,y)-^{Lr{E')^ (Definition [T]) 

^ ^' e iJE"eSiE'^i.,y)) Lt{E") (CoroUaryHl) 

^ w' G \\E"r g (£"1 Lt{E") (Construction of A) 

<^ w' G Ur"p— 2— ("RM L{E") (Recurrence hypothesis) 

^ w' G (x,y)-i(L(£")) (Proposition [H) 

^ xw'y G i(£;') (Definition II]) 4^ w G i(-B')- 

(b) If G {^k{F), lffe(F),H;(i^)}, then it holds w G X(£;') =^ y = H(x) 
(according to Lemma [5]). Consequently, if y ^ II(x),5(£", {x,y)) — and w ^ 
Lr{E'). Hence, since w ^ L{E'), proposition is satisfied. Let us now suppose 
that y = ll{x). Since {e,e) ^ Sr, {x,y) G F x F. 

xw'Kix) G triE') 

O w' G {x,R{x))-\tr{E')) (Definition [J) 

^w' e [jE"eSiE'.i.M(m ~ir{E") (Corollary© 

^ w' UB"e^^-ii^(B') Lr{E") (Construction of A) 

4^ ui' G Ub-'g^— 2^— (B') E{E") (Recurrence hypothesis) 

^w'(E {x,Yi{x))-^{L{E")) (Proposition El) 
<^ xw'}l{x) G L{E') (Definition [II 
<^ w G L{E') 
Finally, 

^r(^) = Ue/^r(j) (Lemma H]) 
= Lr{E) (Construction of A) 

= L{E) (proposition P). □ 

Theorem 2. Let A be the two-sided derived term automaton of a hairpin expres- 
sion E over an alphabet F and let k be the index of E. Then Lr{A) — L{E). 
Furthermore A has at most k x _ 3^ _|_ ^ states where 

m = n -\- h, with n the width of E and h its star number. 
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Proof. Corollary of Proposition [TO] and of Proposition [S) □ 

Finally, the computation of the two-sided derived term automaton provides 
an alternative proof of the following theorem. 

Theorem 3. The language denoted by a hairpin expression is linear context- 
free. 

Proof. According to Theorem [T] and to Proposition [TO] □ 

6 The (H,0)-Completion 

In the literature, the case where k = is usually not considered. Neverthe- 
less, this case is interesting since the associated derivation computation yields 
a recognizer with a linear number of states w.r.t. the width of the expression. 

Let Li and L2 be two languages over an alphabet F and H be an anti- 
morphism over F*. The (H, Q)- completion of Li and L2 is the language Ho(Li, L2) = 
{a7H(a) | a,7 G F* A (07 G Li V 7H(q;) G ^2)}- As in the general case, the 
(H, 0)-completion can be defined as the union of two unary operators flo and 

The left (resp. right) {}i,0)- completion of a language L over an alphabet F 
is the language ^o{L) — {a7H(a) | a,7 £ F* A 7H(a) e L} (resp. }^o{L) = 
{a7H(a) | a,7 G F* A 07 G L}). 

Let E' be a regular expression over F and H be an anti-morphism over F*. 
The left (resp. right) (li,0)- completion of E is the expression 1lo(i?) (resp. 
l^oiE)) that denotes 1io{L{E)) (resp. }to{L{E))). 

Lemma 9. Let F be an alphabet and H be an anti-morphism over F*. Let L be 
a language over F. Then the two following conditions are satisfied: 

• EG }io{L) ^ e G 

• EG ^o{L) ^ e £ L. 

Proof. Trivially proved from the definitions of left and right (H, 0)-completions. 

□ 

We now consider the construction of a recognizer for the (H, 0)-completion of 
a regular expression E. On the opposite of the general case, it is not necessary 
to consider the whole computation of partial derivatives. We show that it is 
suflticient to consider one-sided partial derivatives of regular expression. 

Definition 8. Let F be an alphabet and H be an anti-morphism over T* . Let 
F be a regular expression over F. Let E = '^q{F) (resp. E = '^q(F)). The 
effective subset associated with E is the set defined by: 

£ = to{^)u¥F^ 
(resp. £ = lio{V^) UV^). 
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Definition 9. Let T he an alphabet and H he an anti-morphism over T* . Let 
F he a regular expression over T. Let E = ifo(-F) (resp. E = ^q{F)). Let 
£ he the effective suhset associated with E. Let A = {T,r,Q,I,F,S) he the 
couple NFA defined by: Q = {E}u£, I = {E}, F = {E' G Q \ e G L{E')}, 
V(x,?/)eSr,V£;'eQ, 

toiliE")) ify = ll{x) A E' = to{E"), 
^^^"^ ify = s A E' = to{E"), 



S{E',{x,y)) 



if y = £ A E' is a regular expression, 

otherwise, 

to((i?")|;) */2/ = H(x) A E' = ^io{E"), 
(E")^ " ifx = s A E' = %iE"), 
{E')^ if X = e A E' is a regular expression, 

otherwise. 
The automaton A is said to he the effective automaton of E. 



resp. 6{E', {x,y)) 



Theorem 4. Let F he a regular expression over an alphabet T . Let A he the 
effective automaton of the expression E = l'o(-F) (resp. E = "^oiF)). Then 
Lr(^) = L{E). Furthermore A has at most 2n + 1 states where n is the width 
ofE. 



Proof. Let us set A = (Sr, Q, /, F, 5). 



L{E). 



(I) Let us show now that Lt{A) 
(a) Let us suppose that E = ti.Q{F). Let w be a word in T*. Let us show 
by recurrence over the length of w that for any state E' m Q, w G L{E') <^ 

w e tr{E'). 

(1) liw = s,wG L{E') ^ E' gF i^r(-E'). 

(2) Let w be a word different from e. 

(i) If E' is a regular expression, a~^{L{E')) = [J^„^^^^,^ L{E"). Hence 
since there exists a in F and w' in F* such that w — 



it holds: 



aw' e L{E') ^w' G a-^{L{E')) ^ w' € Uis"e^(i3') HE") 



<E") 



-^w' G [jE"eS(E',{a,e)) 'ir{E") (Recurrence Hypothesis) 

^ aw' G 'tr{E'). 

(ii) If E' = tio{E") then: 

w G L{1lo{E")) ^ 3a,7 e F*, {w = a-fR{a) A a-/ G L{E")) 
<^ 3a G F, 7 e F*, a' G F*, {{w = 7 A 7 G L{E")) V {w = aa'7H(a')H(a) A 
aa'7 G L{E")) 

^ 3a G F.7 G T*,a' G T*,w' G T*.{{w = aw' A w' G a-^{L{E"))) V (w = 
aa'7H(a')H(a) A a'j G a-^{L(E")))) 

<^3aGT,^GT*,a' Gr*,w' Gr*,{{w = aw' A w' G\JE>,^_a_(^E')L{E")) V 
{w = aa'7H(a')H(a) A a'j G \JE"e^m HE"))) 
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e r,7 e r*,a' e T*,w' e r*,((w = aw' a w' e Ub-'g^cb') Lr{E")) V 
{w = aa'7H(a')H(a) A a'j G U£;"e^(E') ^r(-£'"))) (Recurrence Hypothesis) 

^3aer,jer*,a' eT*,w' er*,iiw^ aw' Aw' e {JE"eSiE',ia.e)) 'iriE")) V 
{w = aa'7H(a')H(a) A a'7 G UB"e5(B',(a3(a))) 'Lt{E"))) 

^we triE'). 

Finally since L{A) = ~tr{E) and since L{E) = ~tr{E), then L(A) ^ L{E). 

(b) The case where = 1Ho(F) is based on the same reasoning. 

(II) Let £ = lJo(^^) U be the effective subset associated with E (resp. 
£ = Ho(I'f) U T>f)- Since Card(^) < n (resp. Card(X'i?) < n), the number 
of states of A is at most 2n. Finally, since Q = f U {-E}, it holds that A has at 
most 2n + 1 states. □ 

Example 4. Lei H be the anti-morphism defined in Example [H Let E = 
Ho(a*5c). Notice that ^a*bc = {a*bc,c,e}. Hence the effective subset associated 
with E is the set {Ho(a*6c), Ho(c), Ho(e), a*6c, c, e}. 
The effective automaton A of E is given Figure [7} 

It can be checked that L{A) = {a"6c | n G N}U{a"&ca" | n e N}U{a"&cca" | 
n e N} U {a"bcbca" | n G N} that is exactly L{E) (see Table\^. 



a 


7 


H(a) 


e 


a" 6c 


£ 


a" 


6c 


a" 


a"& 


c 


ca" 


aP-bc 


e 


6ca" 



Table 1: The Language L{E) 



(a, a) 




Figure 4: The Effective Automaton of the Expression E 
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7 Conclusion 



This paper provides an alternative proof of the fact that hairpin completions 
of regular languages are linear context-free. This proof is obtained by consid- 
ering the family of regular expressions extended to hairpin operators and by 
computing their partial derivatives, a technique that has already been applied 
to regular expressions extended to boolean operators ^4j, to multi-tilde-bar op- 
erators [5J and to approximate operators [8J . Moreover it is a constructive proof 
since it is based on the computation of a polynomial size recognizer for hairpin 
completions of regular languages. We also proved that it is possible to compute 
a linear size recognizer for (iJ, 0)-completions of regular languages. 
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